Welcome to the Soil Lipid Atlas. This help document serves as a tool to use the Atlas to conduct statistical analyses. In this atlas, there are seven tabs:

  1. Home
  2. Select Data
  3. Strain Comparison
  4. By Lipid
  5. By Strain
  6. Predictions
  7. Downloads

Within this document, we provide an explanation of each tab as well as the statistical analyses that are conducted.

Home

On the homepage, there is currently one section: “Dataset Information”. Figure 1 shows the tab “Dataset Information” which provides the different experiments currently housed in the Soil Lipid Atlas as well as their corresponding references.

Figure 1. Dataset Information section of the Home tab.
Figure 1. Dataset Information section of the Home tab.

Select Data

The second tab, “Select Data” allows users to select which combination of studies and treatments they would like to analyze. Currently, the atlas houses 3 different studies: AMF Spores Prosser, Mature Tall Wheat Grass, and Bacterial/Fungal Isolates (Soil SFA). The first two studies have only one treatment condition each (Control – Growth Condition and Collected from Field Soil respectively). In contrast, the Bacterial/Fungal Isolates (Soil SFA) Study has four different growth conditions: control, heat, phosphorus depletion, and salt.

It is important to note that some tabs run statistical analyses on log2 fold changes between treatments. To ensure accurate results, these comparisons are limited to treatments from the same study. Therefore, for tabs that use log2 fold changes (“By Strain” and “By Lipid”), users can currently only use the data from the Bacterial/Fungal Isolates (Soil SFA) study. Figure 3 shows a scenario in which a user selects the control and heat treatments from the Bacterial/Fungal Isolates study. To confirm those choices, a user would then click the button “Select Isolate Studies”.

Figure 3. Users select control and heat treatments from the Bacterial/Fungal Isolates (Soil SFA) study.
Figure 3. Users select control and heat treatments from the Bacterial/Fungal Isolates (Soil SFA) study.

Upon selecting treatments, a pop-up should appear confirming that the user selected the following different studies/treatment conditions as demonstrated in Figure 4.

Figure 4. The Soil Lipid Atlas provides a pop-up confirming the treatments/studies that were selected for statistical analyses.
Figure 4. The Soil Lipid Atlas provides a pop-up confirming the treatments/studies that were selected for statistical analyses.

Strain Comparison

The next tab, “Strain Comparison” provides the first method for analyzing data. As this page relies on presence/absence information (rather than log2 fold changes), this tab can be used regardless of the number of treatments selected and can be used across multiple studies.

There are three different comparisons that users can select from: Proportion, Count, or Dimension Reduction.

The first comparison is “Proportion”. In this analysis, users can select different strains found and lipid classes from the selected data to observe how the ratio of different lipid main classes across different strains. A lipid is considered “observed” if it is found in at least half of the samples from a specific strain. Figure 5 shows the lipidome of all 10 isolate strains from the Bacterial/Fungal Isolate (Soil SFA) study for 6 different main classes. For example, Triradylglycerols [GL03] make up around 40% of all lipids identified for Fusarium.

Figure 5. Boxplot showing the distribution of 6 different main classes across all 10 different isolates from the Bacterial/Fungal Isolate (Soil SFA) Study.
Figure 5. Boxplot showing the distribution of 6 different main classes across all 10 different isolates from the Bacterial/Fungal Isolate (Soil SFA) Study.

If users were to select all lipid classes instead, the percentage of lipids observed would add up to 1 accounting for the whole lipidome as seen in Figure 6.

Figure 6. Boxplot showing the distribution of all main classes across all 10 different isolates from the Bacterial/Fungal Isolate (Soil SFA) Study.
Figure 6. Boxplot showing the distribution of all main classes across all 10 different isolates from the Bacterial/Fungal Isolate (Soil SFA) Study.

Additionally, for each figure in the Atlas, there is also an “Optional Graphics Update” dropdown to aid users in customizing figures. Within this dropdown, users can add/update titles of plots, axis labels, font sizes, and the height of the plot. For example, in Figure 7, users can change the plot title to be “Distribution of Lipid Main Classes by Isolate”, add an x-axis label of “Isolate” and adjust the font sizes and make the graph taller.

Figure 7. Boxplot showing the distribution of all main classes across all 10 different isolates from the Bacterial/Fungal Isolate (Soil SFA) Study after adjusting for parameters in the “Optional Graphic Updates” dropdown.
Figure 7. Boxplot showing the distribution of all main classes across all 10 different isolates from the Bacterial/Fungal Isolate (Soil SFA) Study after adjusting for parameters in the “Optional Graphic Updates” dropdown.

The next comparison is the “Count” option. With this comparison, users can select the isolates(s) to determine the number of lipids from each main class that were observed in at least 50% of the samples. Users can calculate this number by either all selected treatments (Control and Heat) or by only one treatment (such as just the Control). Figure 8 shows the count of lipids in both the Control and Heat treatments from the Bacterial/Fungal Isolate (Soil SFA) study.

Figure 8. Count plot of the number of observed lipids in at least 50% of all samples across both Heat and Control treatments from the Bacterial/Fungal Isolates (Soil SFA) study.
Figure 8. Count plot of the number of observed lipids in at least 50% of all samples across both Heat and Control treatments from the Bacterial/Fungal Isolates (Soil SFA) study.

Further, users can also calculate the different in counts between two treatments (difference between Control and Heat). If a user chooses to identify the differences in counts between two treatments, a positive value indicates that there were more observed values in the first treatment selected and a negative value indicates more observed lipids in the second treatment selected. Figure 9 shows the count plot where a positive value indicates that there are more lipids observed from the Heat treatment and a negative value indicates that there are more observed in the Control treatment.

Figure 9. Difference in count plot of the number of observed lipids in at least 50% of all samples across the Heat and Control treatments from the Bacterial/Fungal Isolates (Soil SFA) study.
Figure 9. Difference in count plot of the number of observed lipids in at least 50% of all samples across the Heat and Control treatments from the Bacterial/Fungal Isolates (Soil SFA) study.

The last comparison on the “By Strain” tab is the “Dimension Reduction”. Here, users can select isolate(s) and plot a principal coordinates analysis (PCoA) (Gower 1966) to identify dissimilarity indices. For omics data, this approach enables users to identify how different isolates vary from each other based on the presence/absence data using the Bray-Curtis method (Bray and Curtis 1957) which is the default in the vegan package in R. Users can also color by a variety of different characteristics (such as fungal vs bacterial, gram-positive vs gram-negative, etc.). More comparisons are planned as the database grows. Figure 10 shows the PCoA of the heat and control samples colored by the fungal status (either fungal or bacterial).

Figure 10. PCoA of the 10 isolates from the treatments Heat and Control colored by fungal status (fungal or bacterial).
Figure 10. PCoA of the 10 isolates from the treatments Heat and Control colored by fungal status (fungal or bacterial).

By Lipid

The next tab is the “By Lipid” page. On this tab, users can create barplots to identify fold changes between different treatments from the same study for varying molecules. Users first select a lipid of interest. Users can then select a strain in which that lipid was observed. Users can choose to analyze log2 fold changes for all pairwise comparisons, comparisons to a control, or they can choose to specify specific comparisons.

The log2 fold changes and corresponding p-values are calculated using ANOVA. Currently, no p-values adjustments are being conducted, but this will be implemented in the future. Additionally, the data are assumed to be normalized log2 abundance values. Figure 11 shows the barplot of the comparison Control vs Heat for the lipid “Cer 24:3;40|Cer 16:3;30/8:0;(2OH)_A” for the isolate strain, Dyadobacter. As there is only possible comparison with two selected treatments, there is only one barplot that is created.

**Figure 11. Barplot for the comparison Control vs Heat for the lipid Cer 24:3;40|Cer 16:3;30/8:0;(2OH)_A for the isolate strain, Dyadobacter.**
**Figure 11. Barplot for the comparison Control vs Heat for the lipid Cer 24:3;40|Cer 16:3;30/8:0;(2OH)_A for the isolate strain, Dyadobacter.**

For lipids that are found in more than one strain, users can select from a dropdown of which isolate strain to analyze. For example, in Figure 12, users can use the same lipid as before, but instead analyze Streptomyces rather than Dyadobacter, which yields a non-significant fold change at a 0.05 significance threshold.

**Figure 12. Barplot for the comparison Control vs Heat for the lipid Cer 24:3;40|Cer 16:3;30/8:0;(2OH)_A for the isolate strain, Streptomyces.**
**Figure 12. Barplot for the comparison Control vs Heat for the lipid Cer 24:3;40|Cer 16:3;30/8:0;(2OH)_A for the isolate strain, Streptomyces.**

By Strain

The next tab is the “By Strain” page, in which, users can create heatmaps showing the log2 fold changes between different treatments from the same study. Like with the analyses from the “By Lipid” page, these fold changes and p-values are calculated using ANOVA. Additionally, the data are assumed to be normalized log2 abundance values. Users can select the specific strain they are interested in analyzing as well as an optional additional characteristic (such as Class) as well as what lipid classes they are interested in analyzing. In Figure 13, we select the isolate strain of “Fusarium”, add an optional additional characteristic of abbreviation, and utilize all possible lipid classes. If there were more selected treatments, users could specify custom pairwise comparisons as well.

Future updates to this tab include allowing users to order the lipids by name and/or clustering.

Figure 13. Heatmap showing the statistically significant log2 fold changes for the comparison Heat vs Control for the Bacterial/Fungal Isolate (Soil SFA) study.
Figure 13. Heatmap showing the statistically significant log2 fold changes for the comparison Heat vs Control for the Bacterial/Fungal Isolate (Soil SFA) study.

Predictions

“Predictions”, is the last analysis tab. On this page, users can select a comparison (such as fungal vs bacterial) and an assortment of lipids to predict whether that lipid is more commonly associated with one group or another. These calculations are based on the proportion of samples in which that lipid was observed, and a Wilson Confidence Interval (Wilson, n.d.) is constructed. This method was selected as it allows for binomial confidence intervals for small samples. An estimate greater than 0.5 indicates that the lipid is more likely to be associated with the primary grouping, whereas a value less than 0.5 indicates the reverse. A molecule that contains 0.5 in the confidence interval yield an inconclusive prediction. There are currently 6 different possible comparisons, but in the future, users will have the ability to make their own custom comparisons. Figure 14 shows the confidence intervals for two lipids “CL 61:0|CL 30:0_31:0”, and “Adenosine 5’-monophosphate”, of which the former is predicted to be bacterial and that the latter yields an inconclusive result.

Figure 14. Line plot showing the estimate and confidence interval for 2 lipids for the comparison Fungal vs Bacterial.
Figure 14. Line plot showing the estimate and confidence interval for 2 lipids for the comparison Fungal vs Bacterial.

Users can then select the Table dropdown to find this information in a table format as well as shown in Figure 15.

Figure 15. Line plot and table showing the estimate and confidence interval for 2 lipids for the comparison Fungal vs Bacterial.
Figure 15. Line plot and table showing the estimate and confidence interval for 2 lipids for the comparison Fungal vs Bacterial.

For lipids that are only found in one group (e.g. a lipid was identified in 10 fungal samples but 0 bacterial samples, no uncertainty metrics were calculated, and the estimated probability is simply a 0 or 1 based on the type of samples in which the lipid was observed). Figure 16 shows that the lipids from the main class “PI” are all only found in fungal samples and therefore the estimate is 1 and there are no uncertainty bars.

Figure 16. Line plot showing the estimate and confidence interval for lipids from the class PI for the comparison Fungal vs Bacterial.
Figure 16. Line plot showing the estimate and confidence interval for lipids from the class PI for the comparison Fungal vs Bacterial.

Further, users can run custom comparisons. In the drop down, users can select the option “Custom Comparison”. Selecting this option will create two additional drop downs where users can specify one or more strains as a group one or more strains as the second group. For example, Figure 17 shows that users may want to identify a variety of fatty acids and if they are found in either Fusarium or Soliccocus.

Figure 17. Line plot showing the estimate and confidence interval for lipids from the class PI for the comparison Fungal vs Bacterial.
Figure 17. Line plot showing the estimate and confidence interval for lipids from the class PI for the comparison Fungal vs Bacterial.

Currently, this tab is based on the presence/absence of the lipid of interest. In the future, we plan to expand the model to assess the presence of other lipids that may aid in prediction. That is, are there other biomarkers that can be indicative of the condition for a particular lipid.

Downloads

The last tab is the Downloads page. As users select plots to save for later throughout the app, these plots get stored. On the downloads page, users will find a table listing out all the plots that have been saved throughout the session. Figure 18 shows the ability to click on a row in the table and see a preview of the plot. By default, all plots that are saved are considered for download (as shown in the table at the bottom of the screen).

Figure 18. On the Downloads page, users can preview the plots that they have saved throughout their session on the App.
Figure 18. On the Downloads page, users can preview the plots that they have saved throughout their session on the App.

However, if a user decides to remove it from the list, they can select the button “Remove selected plot”. For example, Figure 19 shows that by removing the fourth plot, the table showing plots prepped for download has been reduced from seven plots to six.

Figure 19. Users can remove plots that they do not want to download.
Figure 19. Users can remove plots that they do not want to download.

To download the plots, users select the button “Confirm Bundle” to lock in the plots and then “Download Bundle” to download a zipped folder with the figures.

References

Bray, J. Roger, and John T. Curtis. 1957. “An Ordination of the Upland Forest Communities of Southern Wisconsin.” Ecological Monographs 27 (4): 326–49.
Gower, J. C. 1966. “Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis.” Biometrika 53 (3-4): 325–38.
Wilson, Edwin B. n.d. “Probable Inference, the Law of of Succession, and Statistical Inference.” Journal of the American Statistical Association 22 (158): 209–12.